71 research outputs found
Neural Illumination: Lighting Prediction for Indoor Environments
This paper addresses the task of estimating the light arriving from all
directions to a 3D point observed at a selected pixel in an RGB image. This
task is challenging because it requires predicting a mapping from a partial
scene observation by a camera to a complete illumination map for a selected
position, which depends on the 3D location of the selection, the distribution
of unobserved light sources, the occlusions caused by scene geometry, etc.
Previous methods attempt to learn this complex mapping directly using a single
black-box neural network, which often fails to estimate high-frequency lighting
details for scenes with complicated 3D geometry. Instead, we propose "Neural
Illumination" a new approach that decomposes illumination prediction into
several simpler differentiable sub-tasks: 1) geometry estimation, 2) scene
completion, and 3) LDR-to-HDR estimation. The advantage of this approach is
that the sub-tasks are relatively easy to learn and can be trained with direct
supervision, while the whole pipeline is fully differentiable and can be
fine-tuned with end-to-end supervision. Experiments show that our approach
performs significantly better quantitatively and qualitatively than prior work
Rearrangement Planning for General Part Assembly
Most successes in autonomous robotic assembly have been restricted to single
target or category. We propose to investigate general part assembly, the task
of creating novel target assemblies with unseen part shapes. As a fundamental
step to a general part assembly system, we tackle the task of determining the
precise poses of the parts in the target assembly, which we we term
``rearrangement planning''. We present General Part Assembly Transformer
(GPAT), a transformer-based model architecture that accurately predicts part
poses by inferring how each part shape corresponds to the target shape. Our
experiments on both 3D CAD models and real-world scans demonstrate GPAT's
generalization abilities to novel and diverse target and part shapes.Comment: Project website: https://general-part-assembly.github.io
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
While there has been remarkable progress in the performance of visual
recognition algorithms, the state-of-the-art models tend to be exceptionally
data-hungry. Large labeled training datasets, expensive and tedious to produce,
are required to optimize millions of parameters in deep network models. Lagging
behind the growth in model capacity, the available datasets are quickly
becoming outdated in terms of size and density. To circumvent this bottleneck,
we propose to amplify human effort through a partially automated labeling
scheme, leveraging deep learning with humans in the loop. Starting from a large
set of candidate images for each category, we iteratively sample a subset, ask
people to label them, classify the others with a trained model, split the set
into positives, negatives, and unlabeled based on the classification
confidence, and then iterate with the unlabeled set. To assess the
effectiveness of this cascading procedure and enable further progress in visual
recognition research, we construct a new image dataset, LSUN. It contains
around one million labeled images for each of 10 scene categories and 20 object
categories. We experiment with training popular convolutional networks and find
that they achieve substantial performance gains when trained on this dataset
REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction
The ability to detect and analyze failed executions automatically is crucial
for an explainable and robust robotic system. Recently, Large Language Models
(LLMs) have demonstrated strong common sense reasoning skills on textual
inputs. To leverage the power of LLM for robot failure explanation, we propose
a framework REFLECT, which converts multi-sensory data into a hierarchical
summary of robot past experiences and queries LLM with a progressive failure
explanation algorithm. Conditioned on the explanation, a failure correction
planner generates an executable plan for the robot to correct the failure and
complete the task. To systematically evaluate the framework, we create the
RoboFail dataset and show that our LLM-based framework is able to generate
informative failure explanations that assist successful correction planning.
Project website: https://roboreflect.github.io
ASPiRe:Adaptive Skill Priors for Reinforcement Learning
We introduce ASPiRe (Adaptive Skill Prior for RL), a new approach that
leverages prior experience to accelerate reinforcement learning. Unlike
existing methods that learn a single skill prior from a large and diverse
dataset, our framework learns a library of different distinction skill priors
(i.e., behavior priors) from a collection of specialized datasets, and learns
how to combine them to solve a new task. This formulation allows the algorithm
to acquire a set of specialized skill priors that are more reusable for
downstream tasks; however, it also brings up additional challenges of how to
effectively combine these unstructured sets of skill priors to form a new prior
for new tasks. Specifically, it requires the agent not only to identify which
skill prior(s) to use but also how to combine them (either sequentially or
concurrently) to form a new prior. To achieve this goal, ASPiRe includes
Adaptive Weight Module (AWM) that learns to infer an adaptive weight assignment
between different skill priors and uses them to guide policy learning for
downstream tasks via weighted Kullback-Leibler divergences. Our experiments
demonstrate that ASPiRe can significantly accelerate the learning of new
downstream tasks in the presence of multiple priors and show improvement on
competitive baselines.Comment: 36th Conference on Neural Information Processing Systems (NeurIPS
2022
- …